Practice with ggplot2

Where should I put the aes() bit?

If you put it at the “top level” inside ggplot(aes(...)), the mapping will apply to all levels. For example:

bears %>% 
  count(month) %>%
  ggplot(aes(x = month, y = n)) +
  geom_point() + 
  geom_line()

In contrast, if you put the aes() mapping inside a single geometry layer, it will only apply to that layer. For example, this will cause an error since the geom_line() part doesn’t have an aesthetic mapping:

bears %>% 
  count(month) %>% 
  ggplot() +
  geom_point(aes(x = month, y = n)) + 
  geom_line()
#> Error in `geom_line()`:
#> ! Problem while setting up geom.
#> ℹ Error occurred in the 2nd layer.
#> Caused by error in `compute_geom_1()`:
#> ! `geom_line()` requires the following missing aesthetics: x and y

Main geoms

geom_point()

Basic scatterplot:

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy))

Change color for all points:

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy), color = 'blue')

To change color based on a variable, map the variable to color in aes():

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy, color = class)) 

Map the shape instead of color (usually not a great idea):

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy, shape = class)) 

What happened to SUV?

geom_line() vs. geom_smooth()

geom_line() connects all the dots:

mpg %>% 
  ggplot() +
  geom_line(aes(x = displ, y = hwy))

The reason this looks messy is because geom_line() is trying to literally connect every dot from left to right.

If you wanted a single “best-fit” trend line, use geom_smooth():

mpg %>% 
  ggplot() +
  geom_smooth(aes(x = displ, y = hwy))

Set se = FALSE to drop the error bounds:

mpg %>% 
  ggplot() +
  geom_smooth(aes(x = displ, y = hwy), se = FALSE)

geom_col()

For these examples, I’m creating a smaller summary data frame first that just counts how many rows there are for each class:

mpg %>% 
  count(class)
#> # A tibble: 7 × 2
#>   class          n
#>   <chr>      <int>
#> 1 2seater        5
#> 2 compact       47
#> 3 midsize       41
#> 4 minivan       11
#> 5 pickup        33
#> 6 subcompact    35
#> 7 suv           62

Basic bar plot of the counts:

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = class, y = n), width = 0.7) # width is width of bars

Re-order bars based on count using reorder():

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n), width = 0.7)

To change the color for all bars, use fill (not color):

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n), fill = 'blue', width = 0.7)

To change color based on a variable, map the variable to fill in aes():

mpg %>% 
  count(class, drv) %>% # Note I had to include drv in the count too 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n, fill = drv), width = 0.7)

Use position = 'dodge' to change from stacked to side-by-side:

mpg %>% 
  count(class, drv) %>% # Note I had to include drv in the count too 
  ggplot() +
  geom_col(
    aes(x = reorder(class, n), y = n, fill = drv),
    position = "dodge", width = 0.7)

Practice

mpg %>% 
  ggplot() +
  geom_smooth(aes(x = displ, y = hwy, color = drv))

mpg %>% 
  count(class, drv) %>% 
  ggplot() +
  geom_col(aes(x = drv, y = n, fill = class), width = 0.7)

mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) + 
  geom_smooth(se = FALSE)

Facets

Facets make multiple small charts and are useful when you have many levels in a categorical variable.

For example, this plot has too many color categories for the color to be useful:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

Instead, we can use facet_wrap() to show multiple charts of each vehicle class:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  facet_wrap(~class)

You can also use facet_grid() to facet by two variables:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  facet_grid(drv ~ cyl)

Extra Practice

bears %>%
  count(year, gender) %>%
  ggplot() +
  geom_col(aes(x = year, y = n, fill = gender)) +
  labs(
    x     = "Year",
    y     = 'Number of killings',
    fill  = "Victim gender",
    title = "Annual deadly bear attacks over time"
  ) +
  theme_bw()

mpg %>%
    mutate(manufacturer = str_to_title(manufacturer)) %>%
    group_by(manufacturer) %>%
    summarise(mean_hwy = mean(hwy)) %>%
    ggplot() +
    geom_col(aes(x = mean_hwy, y = reorder(manufacturer, mean_hwy)), width = 0.9) +
    labs(
      x = 'Highway fuel economy (mpg)',
      y = 'Vehicle manufacturer',
      title = 'Mean fuel economy by automaker'
    ) +
    theme_minimal()